Ward’s method (a.k.a. Minimum variance method or Ward’s Minimum Variance Clustering Method) is an alternative to single-link clustering. Popular in fields like linguistics, it’s liked because it usually creates compact, even-sized clusters. Like most other clustering methods, Ward’s method is computationally intensive. However, Ward’s has significantly fewer computations than other methods. The drawback is this usually results in less than optimal clusters. That said, the resulting clusters are usually good enough for most purposes.
Like other clustering methods, Ward’s method starts with n clusters, each containing a single object. These n clusters are combined to make one cluster containing all objects. At each step, the process makes a new cluster that minimizes variance, measured by an index called E (also called the sum of squares index).
K-medoids or partitioning around medoids (PAM) algorithm is a clustering algorithm reminiscent of the k-means algorithm. Both the k-means and k-medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster.
## [-> 1] [-> 2] [-> 3] [-> 4] [-> 5]
## [1 ->] 0.81 0.09 0.04 0.05 0.02
## [2 ->] 0.00 0.89 0.04 0.05 0.01
## [3 ->] 0.00 0.02 0.90 0.06 0.02
## [4 ->] 0.00 0.01 0.02 0.94 0.02
## [5 ->] 0.00 0.01 0.02 0.07 0.90
cl1.3 <- cutree(clusterward1, k = 3)
cl1.3fac <- factor(cl1.3, labels = paste("Type", 1:3))
# Number of sequences in each cluster
table(cl1.3)
## cl1.3
## 1 2 3
## 2244 950 306
# Seqrplot displays a reduced, non redundant set of representative sequences extracted from the provided state sequence object and sorted according to a representativeness criterion
seqrplot(sample, diss = dist.om1, group = cl1.3fac,border = NA)
# Seqdplot represents the sequence of the cross-sectional state frequencies by position (time point)
seqdplot(sample, group = cl1.3fac, border = NA)
# Seqfplot displays the most frequent sequences, each one with an horizontal stack bar of its successive states
seqfplot(sample, group = cl1.3fac, border = NA)
# Seqmtplot displays the mean time spent in each state
seqmtplot(sample, group = cl1.3fac, border = NA)
# Seqhtplot displays the evolution over positions of the cross-sectional entropies ( entropy is 0 when all cases are in the same state and is maximal when the same proportion of cases are in each state; the entropy can be seen as a measure of the diversity of states observed at the considered time point)
seqHtplot(sample, group = cl1.3fac, border = NA)
cl1.4 <- cutree(clusterward1, k = 4)
cl1.4fac <- factor(cl1.4, labels = paste("Type", 1:4))
table(cl1.4)
## cl1.4
## 1 2 3 4
## 1936 950 308 306
seqrplot(sample, diss = dist.om1, group = cl1.4fac,border = NA)
seqdplot(sample, group = cl1.4fac, border = NA)
seqfplot(sample, group = cl1.4fac, border = NA)
seqmtplot(sample, group = cl1.4fac, border = NA)
seqHtplot(sample, group = cl1.4fac, border = NA)
cl1.5 <- cutree(clusterward1, k = 5)
cl1.5fac <- factor(cl1.5, labels = paste("Type", 1:5))
table(cl1.5)
## cl1.5
## 1 2 3 4 5
## 540 1396 950 308 306
seqrplot(sample, diss = dist.om1, group = cl1.5fac,border = NA)
seqdplot(sample, group = cl1.5fac, border = NA)
seqfplot(sample, group = cl1.5fac, border = NA)
seqmtplot(sample, group = cl1.5fac, border = NA)
seqHtplot(sample, group = cl1.5fac, border = NA)
cl1.6 <- cutree(clusterward1, k = 6)
cl1.6fac <- factor(cl1.6, labels = paste("Type", 1:6))
table(cl1.6)
## cl1.6
## 1 2 3 4 5 6
## 540 1396 488 462 308 306
seqrplot(sample, diss = dist.om1, group = cl1.6fac,border = NA)
seqdplot(sample, group = cl1.6fac, border = NA)
seqfplot(sample, group = cl1.6fac, border = NA)
seqmtplot(sample, group = cl1.6fac, border = NA)
seqHtplot(sample, group = cl1.6fac, border = NA)
cl1.7 <- cutree(clusterward1, k = 7)
cl1.7fac <- factor(cl1.7, labels = paste("Type", 1:7))
table(cl1.7)
## cl1.7
## 1 2 3 4 5 6 7
## 540 1226 488 462 308 306 170
seqrplot(sample, diss = dist.om1, group = cl1.7fac,border = NA)
seqdplot(sample, group = cl1.7fac, border = NA)
seqfplot(sample, group = cl1.7fac, border = NA)
seqmtplot(sample, group = cl1.7fac, border = NA)
seqHtplot(sample, group = cl1.7fac, border = NA)
par(mar= c(1, 1, 1, 1))
cl1.8 <- cutree(clusterward1, k = 8)
cl1.8fac <- factor(cl1.8, labels = paste("Type", 1:8))
table(cl1.8)
## cl1.8
## 1 2 3 4 5 6 7 8
## 306 1226 488 462 234 308 306 170
seqrplot(sample, diss = dist.om1, group = cl1.8fac,border = NA)
seqdplot(sample, group = cl1.8fac, border = NA)
seqfplot(sample, group = cl1.8fac, border = NA)
seqmtplot(sample, group = cl1.8fac, border = NA)
seqHtplot(sample, group = cl1.8fac, border = NA)